A multidevice and multimodal dataset for human energy expenditure estimation using wearable devices

We present a multi-device and multi-modal dataset, called WEEE, collected from 17 participants while they were performing different physical activities. WEEE contains: (1) sensor data collected using seven wearable devices placed on four body locations (head, ear, chest, and wrist); (2) respiratory data collected with an indirect calorimeter serving as ground-truth information; (3) demographics and body composition data (e.g., fat percentage); (4) intensity level and type of physical activities, along with their corresponding metabolic equivalent of task (MET) values; and (5) answers to questionnaires about participants’ physical activity level, diet, stress and sleep. Thanks to the diversity of sensors and body locations, we believe that the dataset will enable the development of novel human energy expenditure (EE) estimation techniques for a diverse set of application scenarios. EE refers to the amount of energy an individual uses to maintain body functions and as a result of physical activity. A reliable estimate of people’s EE thus enables computing systems to make inferences about users’ physical activity and help them promoting a healthier lifestyle.


Background & Summary
Human energy expenditure (EE) refers to the amount of energy an individual uses to maintain essential body functions (respiration, circulation, digestion) and as a result of physical activity 1 . Knowledge regarding the expended energy or calories could help people (e.g., athletes, obese, diabetic) to plan their physical activity for leading a healthier lifestyle 2 . Additionally, it could be used to enable nutrition coaching for weight management purposes 3 . Devising methods for EE estimation (EEE) is a key enabler of the mentioned intervention strategies and it is the core goal of the dataset presented in this paper.
The gold-standard EE measurement methods are direct calorimetry-which measures body heat while the subject is inside a chamber-, indirect calorimetry-that consists of a mouth piece worn for respiratory gases analysis-and doubly labeled water-which measures carbon dioxide production during the interval between first and last body water samples [3][4][5] . Such techniques require the use of cumbersome and expensive equipment and are not feasible to measure EE in free-living conditions for specific activities on a minute by minute basis. Measuring EE in real-world scenarios in a fine-grained manner would enable obtaining valuable information regarding people's physical activity and providing personalized and timely recommendations.
Considering the cost and practical limitations of gold-standard methods combined with the proliferation of ubiquitous computing 3 , several researchers started exploring the use of mobile and wearable devices for EEE [6][7][8] . Such devices are suitable for continuous monitoring of EE because they are unobtrusive and do not hamper the natural behavior of the user in free-living conditions. Additionally, they have the potential to provide a cheap and reliable solution to this problem. Despite the considerable research progress in sensor-based EEE, several challenges remain open. In particular, it is not evident which type of sensor, body position or combination thereof would enable reliable EEE. Also, there is a lack of studies investigating the quality of data and how it influences the robustness of EEE. Such investigations are impeded by the lack of sensor-diverse, multimodal and publicly available datasets, which could potentially enable the development of more accurate EEE techniques 4,7 . While there exist commercial wearable devices that measure EE (mainly using demographics data and accelerometer sensor), it is not clear how they compare to gold-standard measurements (e.g., indirect calorimetry) and new sensor-based techniques (e.g., physiological sensors).
To overcome such barriers and foster further developments in EEE, in this paper, we introduce a new, multimodal dataset collected from 17 participants using 7 wearable devices, each containing multiple sensors. The goal of the dataset is to enable the design and development of new sensor-based EEE techniques during rest and physical activity. To this goal, we design and run a data collection protocol, which consists of three activities, such as resting, cycling and running, each performed for 10 minutes. We picked these activities because they involve movements of different intensity levels (e.g., light, moderate and vigorous). In addition, they require full-, half-or no-body movement, which are representative of physical activities performed in everyday life, as discussed in 3 . Each physical activity was performed at two intensity levels to cover a wider range of movement intensity and explore the EE changes during such intensities. For instance, participants ran at two different speeds for 5 minutes each.
The dataset is collected using an indirect calorimeter, a headband, earbuds, two chest-belts (a commercial and a gold-standard device), and three wristbands (a research-grade and two commercial devices). At least one or more devices include the following sensor data: oxygen consumption (VO2), fraction of oxygen in expired breath (FeO2), air moved by the lungs (Ve), volume breathed in a breath (Tv), breaths per minute (BR), humidity (H), temperature (T), pressure (P), acceleration (ACC), gyroscope (GYRO), photoplethysmography (PPG), electrocardiography (ECG), electrodermal activity (EDA), skin temperature (TEMP) and electroencephalography (EEG) and information derived from sensors such as e.g., heart rate (HR), heart rate variability (HRV), breathing rate (BR), body posture and more. Table 1 presents an overview of existing datasets in the literature that enable EE modeling using sensor data. Only two of the existing datasets are publicly available for download, e.g. 3,9 , marked with "Yes" in the "Publicly Available" column of the table. In comparison to these datasets, our dataset contains a higher number of unique data sources (in total 18). Further, it is the only dataset that contains ACC and HR from multiple body locations, such as the ear, wrist, and chest, which allows researchers to investigate the development of novel techniques for EE estimation. Only Bouarfa et al. 10 investigated the use of ACC placed on the ear to estimate EE. However, estimating EE from ACC and HR data collected from the ear has not yet been explored. Additionally, WEEE contains data from both medical grade devices (e.g., Zephyr Bioharness) and commercial devices (e.g., Fitbit sense and Apple Watch), which enables the comparison of HR measurements between such devices.

Methods
To enable multimodal EE modeling, we design a controlled experiment and ask participants to perform a set of pre-defined activities. We opt for a controlled study because, despite its constraints, it enables running detailed analysis of the phenomenon under investigation and it is suitable for the replicability of the data collection procedure. In this section, we provide details about the participants, data collection setup and protocol, and the collected data. participants. We recruited 17 participants (12 males and 5 females) using snowball sampling 11 . Participants were of age between 23 and 41 years old (MEAN = 30, STD = 5) and with an average BMI of 24.5 (STD = 2.9). The study was conducted following the ethical regulations at our institution. All the participants signed an informed consent form and agreed their data to be used for research purposes. Participants were instructed to wear comfortable attire for the experiment. Also, we asked participants to be in a rested and fasting state by refraining from endurance training for 24 hours prior the study and avoiding caffeine, tobacco, alcohol, and food intake 3 hours before the experiment. Participants were compensated with a £20.-gift card.
setup. As a preparation for each experiment, we charged the devices and visually verified that the clock of each device matched the same time reference to ensure synchronization among the devices. This included checking for the date, time (in terms of hours, minutes and seconds) and time zone. Before the experiment, participants completed a set of questionnaires regarding their eating habits, sleep, stress and physical activity level. Before starting the experiment, we asked the participants to step up on the QardioBase smart scale (https://www.qardio. com/qardiobase-smart-scale-iphone-android/) to measure body composition metrics (e.g., weight, muscle percentage). We then placed the devices as follows on the participant: VO2 Master Analyzer (https://vo2master.com/) on the face, Nokia Bell Labs earbuds 12,13 on the right ear, Muse S headband (https://choosemuse.com/muse-s/) on the head, Empatica E4 wristband 14 on the non-dominant hand, Zephyr BioHarness chestbelt (https://www. zephyranywhere.com/) and Wahoo Tickr chest strap (https://eu.wahoofitness.com/devices/heart-rate-monitors) on the chest, Fitbit Sense watch (https://www.fitbit.com/global/us/products/smartwatches/sense) and Apple watch (https://www.apple.com/apple-watch-series-6/index.html) on the dominant hand. Figure 1 presents an overview of the study setup, devices used and their location. We ensured proper attachment of the face mask and calibration of the flow sensor, as recommended in 3 . Muse S headband, Zephyr Bioharness and Wahoo chestbelts were moisturized with water before attaching them to participant's body. The earbuds are a multi-sensory earable device under development by the Nokia Bell Labs, which has been already tested in 12,13,15 . The VO2 Master Analyzer device has a smaller size than the major portable metabolic analyzer brands, which makes it a suitable option for VO2 measurements. Montoye et al. 16 have shown acceptable validity and reliability of this device in comparison to gold-standard measurements. Furthermore, the VO2 Master Analyzer is compatible with other devices, such as, e.g., the Wahoo Tickr-validated in 17 -, which makes it easier for the researchers to obtain additional data (e.g., heart rate) together with VO2 measurements. The Zephyr BioHarness chest belt contains an ECG sensor, which provides heart rate measurements. Nazari et al. 18 have shown evidence of the reliability and validity of heart rate measurements across multiple contexts using this device. The Zephyr BioHarness has been used also in other studies 3,[19][20][21] . The Empatica E4 device is a watch-like, multi-sensor device. It is light, easy to use and comfortable to wear, which makes it suitable to monitor people's energy expenditure. Additionally, the Empatica E4 provides the raw sensor data as well as encrypts the data during transfer and does not store user's personal data, which is convenient to preserve the privacy of the study participants. The Empatica E4 has been extensively used in the literature for energy expenditure estimation 21 , but also other tasks 15,22,23 . We chose the Fitbit and Apple watch devices because they are among the most popular smartwatches available in the market, as shown in a recent article by The Economist magazine in 24 . Also, they have shown high accuracy for measuring heart rate  Table 1. Comparison of the existing datasets for energy expenditure modeling and our dataset. The table shows the dataset name and paper where it was presented, the number of subjects in the existing datasets, devices used, types of sensors and body locations at which sensors are placed during the data collection as well as number of physical activities. We linked the publicly avaialble dataset name to the repository where it can be downloaded.
In this literature review we favored work that collected data from more than one data source or body location. For a more detailed overview of other existing datasets please refer to Alvarez et al. 3 . *Cycling performed at different intensities. **In the form of a shirt. ***Available by sending a request to the corresponding author. HR -Heart rate, ACC-acceleration, VO2-oxygen consumption, VCO2-carbon dioxide exhaled, ECGelectrocardiography, BR-breathing rate, EDA-electrodermal activity, TEMP-skin temperature, HRV-heart rate variability, RR-Interbeat interval, FeO2-fraction of oxygen in expired breath, Ve-air moved by the lungs, Tv-volume breathed in a breath, BR-breaths per minute, H-humidity, T-temperature, P-pressure, GYROgyroscope, PPG-photoplethysmography, EEG-electroencephalography. www.nature.com/scientificdata www.nature.com/scientificdata/ during physical activities considered in our work (e.g., cycling, running) 25 . We chose the Muse S device because it is a portable and unobtrusive brain-sensing headband and has been previously validated in the literature 15,26,27 . procedure. Figure 2 depicts an overview of the study protocol. Participants followed a predefined set of activities, similar to 28,29 , grouped into three parts: resting, cycling and running. During resting, participants were asked to sit on a chair and stand on their feet, for 5 minutes each, to obtain physiological data during a resting state. After that, they cycled in an indoor bike and run on a treadmill, for 10 minutes in each activity. Both cycling and running activity were performed in two intensity levels, each of 5 minutes. We used a window of 5 minutes for each activity to reach a steady state EE, as recommended in 3 . The intensities of these activities were selected by the participants to represent their individual habits, as suggested in previous work 30,31 . The total duration of the experiment was 30 minutes. For consistency, the bicycle resistance and treadmill inclination were kept the same for all participants.
We picked resting, cycling and running activities because these activities involve movements of different intensity levels (e.g., light, moderate and vigorous). For instance, sitting or standing requires no or light movement, cycling requires half-body or moderate movement and running full body or vigorous movement. We run the protocol from low to high intensity to avoid the impact of high activities into low intensity ones. collected data. We collect five types of data: sensor data, respiratory gases, demographics and body composition, activity data and questionnaires data explained as follows.
Sensor and respiratory gases. Table 2 shows an overview of the characteristics of devices used to collect WEEE dataset. The table presents the device used, device location, the type of data that was collected for each device as well as paper(s) that validated the sensor readings of the device. The table shows that WEEE contains data from 8 different devices (including an indirect calorimeter serving as ground-truth information) placed on 5 unique body locations. Some of the sensors (e.g., ACC, PPG) are available in more than one body location (e.g., ear, wrist, chest).
Demographics and body composition. To collect body composition and demographics data, we use QardioBase smart scale. In particular, we collect participants' gender, age, height, weight, percentage of body fat, muscle, bone, water and body mass index (BMI). Muscle mass percentage is calculated as the percentage of muscle in the body as compared to total body weight. Table 3 shows the mean (standard deviation) of the demographics and body composition data for all participants as well as for participants with female or male body types. The range of BMI is 20 to 30 kg/m 2 (MEAN = 24.5, STD = 2.9).
Activity data. We derive labels regarding the activity performed from the protocol. Also, we kept notes of the intensity level (speed) of each activity. To enable further comparisons, we include the metabolic equivalent of a task (MET) values for each activity type based on intensity as defined in the compendium of physical activities 32 .
Questionnaires. We assess participant's physiological and physical state before the experiment using validated questionnaires. In particular, we evaluate their sleep quality level over the past month using the Pittsburgh Sleep Quality Index (PSQI) 33 and sleepiness level before the experiment using the Stanford Sleepiness Scale (SSS) 34 . Participants also report their stress level using the Perceived Stress Scale (PSS) 35 , physical activity level using the International Fitness Scale (IFIS) 36 , the readiness for physical activity using the Physical Activity Readiness (PAR-Q) 37 , and How healthy is your diet? to measure the nutritional value of their diet, which have an impact on EE. www.nature.com/scientificdata www.nature.com/scientificdata/

Data records
The raw data can be found at Zenodo 38 and the dataset is available for download at this link: https://doi. org/10.5281/zenodo.6420886. Data of each participant has been anonymized with an alphanumeric format of P#, to which we refer to as participant identifier, and is placed on separate folders named with participant identifier (e.g., P1). The dataset contains a folder for each participant and some other files described as follows: Demographics.csv contains demographics (e.g., gender, age) and body composition data (e.g., BMI, percentage of fat, muscle, water, bone) for each participant in an anonymous format, Study_Information.csv contains the start and end time of each study condition (e.g., start time of the sitting or cycling activity), speed of cycling/running and MET information for each activity, Questionnaires folder contains the answers to the pre-study questionnaires regarding participants' physiological state. Within each participant folder, there are five other folders, namely, VO2, EARBUDS, E4, ZEPHYR, and MUSE, which contain the raw data obtained from each device during data collection. Table 4 provides an overview and description of the main files inside a participant folder.
Missing data. The MUSE S device data of participant P02 is missing due to a malfunction in the streaming of the sensor data to the third-party app MindMonitor (https://mind-monitor.com/), which we used to collect the data. Part of the VO2 data of P03 and P12 during the cycling condition and of P16 during the running condition was lost due to issues with the indirect calorimeter VO2 sensor.

technical Validation
We evaluate the technical validity of the dataset, i.e., whether the sensor measure what they are expected, in three ways: (1) by providing descriptive statistics of the data in comparison to the device manuals, (2) by investigating the relationship between physiological signals collected from different body locations and (3) comparing the changes in sensor data for different physical activities, as suggested in 39 . Table 5 presents descriptive statistics of the collected data for each device together with reference values obtained from the devices' manuals. These statistics support the validity of the dataset because the minimum and maximum values obtained from the sensors are within the expected range for the majority of the sensors. For instance, the minimum (47)

ACC.csv
Column 2 Y-axis of accelerometer sensor.
Column 3 Z-axis of accelerometer sensor.

HR.csv
Column 1 Average heart rate extracted from the BVP signal.

IBI.csv Column 1
The time of the detected inter-beat interval expressed in seconds (s).

TEMP.csv Column 2
The distance of the current beat from the previous beat in seconds (s).  www.nature.com/scientificdata www.nature.com/scientificdata/ with the E4 devices are within the ±2 range. These observations confirm that the data in WEEE dataset are as expected according to the devices' manuals. We observe that the minimum HR derived from the E4 and earbuds fall below the expected minimum, this could be due to the presence of motion artifacts in PPG signal from which HR is derived. We recommend careful identification and removal of artifacts in the PPG signal before further analysis.

info.txt
To further evaluate the validity of our dataset, we explore the association between physiological signals collected from different body locations. Given that HR and ACC data are available from multiple body positions, we investigate the relationship between such data collected from different body positions. To perform this analysis, we compute Pearson product-moment correlation when data samples conform to a Gaussian distribution and Spearman rank correlation otherwise, as a common procedure in the literature 40 . We use Shapiro-Wilk test to verify whether the data conforms a Gaussian distribution. We test the p-values against both p < 0.05 threshold as well as the corrected threshold ( = = . p 0 01 c p n , where n refers to body locations or devices and is equal to 5), to account for the Bonferroni correction 41 . Figure 3 presents the heatmap with correlations coefficients between sensor data collected from different devices. As expected, we observe that the motion data (e.g., ACC, GYRO) collected from the ear, chest or wrist is significantly positively correlated to each other (p < 0.01).
We further explore the difference in sensor data for each physical activity. Figure 4 shows the distribution of EE measured using the indirect calorimetry (left), HR (middle) and GYRO (right) data measured using earbuds. As expected the average amount of EE during activities with high intensity movements is higher than for those with low intensity movements. For instance, the average EE during running or cycling are higher than during resting activities (e.g., sitting and standing). We observe similar patterns for HR and GYRO sensor data. This exploration of the data further confirms the validity and reliability of the collected data.
The WEEE dataset fosters research and development of new solutions to problems as follows: • Device/Sensor Fusion: The dataset contains raw measurements from sensors in multiple devices placed on the head, ear, wrist and chest. Thanks to its large number of wearable devices and sensor types, the dataset enables exploration of which sensor (device) or combination thereof enables a more accurate measurement of EE. For  www.nature.com/scientificdata www.nature.com/scientificdata/ instance, the dataset enables exploring different sensor (device) fusion strategies such as e.g., stacking sensor channels one after the other, multi-input architecture, ensemble methods, and feature concatenation.
• Sensor Location: Researchers may further explore how the sensor position impacts the EEE. To the best of our knowledge, our dataset enables for the first time using heart rate and motion data collected from the ear for EEE and comparing it to the same data sources collected from other body positions.   www.nature.com/scientificdata www.nature.com/scientificdata/ • Individual Characteristics: The literature has shown that age, gender, body size and composition have an impact in EE. For instance, individuals with a larger body require a higher amount of energy than those with smaller body size because of the amount of tissues 4 . Our dataset enables a systematic, data-driven exploration of the impact of such individual characteristics in EEE. • Context Information: Several researchers have shown that combining human activity recognition and EEE generally leads to better EEE 6 . Our dataset contains information about the type of activity that participants performed and its intensity level, which allows researchers investigating methods to simultaneously recognize the activity type, intensity level and EEE as well as understand their impact in EEE. • Physiological Conditions: Investigating the impact of physiological conditions, e.g., physical activity level, diet, stress, and sleep in the overall EEE. • Data Quality: Exploration of the impact of data quality (e.g., presence of noise and missing data) in the overall EEE. For instance, researchers could develop new methods to leverage the data from available sensors to handle noisy data, missing data points, missing sensor or device problems.    While the WEEE data set opens up novel opportunities for computing systems that monitor energy expenditure, our approach presents some limitations and opportunities for further improvements. The first limitation stems from the low number of physical activities investigated. We opted for this decision to avoid having a long experiment protocol and to avoid causing fatigue to our study participants. Future work should consider extending our approach by adding more various physical activities. Even if our data set contains 3 activities, each of these activities has been performed in two intensity levels, which make the data set diverse in terms of types of activities and intensity levels.
indirect calorimetry data. The data collected from the indirect calorimetry can be used as a ground truth in future analysis. To prepare indirect calorimetry data for the analysis, the VO2 data should first be cleaned, for instance, by removing the values when VO2 sensor did not record any data (e.g., VO2 = 0). Then VO2 data should be converted to EE using equations from the literature e.g., in 4 .   www.nature.com/scientificdata www.nature.com/scientificdata/ Earbuds data. To use the data collected from earbuds, one should first convert the raw ACC data to milli-g by multiplying it with 0.061 and the raw GYRO data to milli-dps (degrees per second) by multiplying with 17.5. This is to convert the raw data coming from the sensor from integer format to a more usable format (i.e., milli-g and milli-dps). Then remove the direct current (DC) offset from the GYRO data by applying a Butterworth band-pass filter. To clean the PPG signal, one could apply a Butterworth band-pass filter and then extract HR using the NeuroKit library mentioned before.
Wristband data. To clean ACC and TEMP data, we suggest to apply a central moving average filter with a window of 1 minute, similar to 23 . Then to compute the ACC magnitude. The EDA data should be cleaned using a first order Butterworth low-pass filter with a cut-off frequency of 0.6 Hz, similar to 42 . The EDA data can further be dicomposed into the tonic-the slowly changing component-and phasic-characterized by skin conductance responses (SCRs) or peaks that occur as a result of a stimuli-components, using the cvxEDA method proposed by Greco et al. 43 . To clean the PPG data, a first order Butterworth FIR filter with a cut-off frequency of 5 Hz should be applied, as suggested in 44 . The HR data can then be derived from PPG using the NeuroKit library 45 .   www.nature.com/scientificdata www.nature.com/scientificdata/ Questionnaire data. Figures 5 to 15 present a summary of the answers received from all the participants for the PSQI, SSS, IFIS, PSS and "How healthy is your diet?" questionnaires. Such data can be used as additional information regarding the physical and physiological state of participants before the experiment. other data. The data from Wahoo Ticker and Zephyr BioHarness are preprocessed and provided at a 1 Hz granularity. For these reason, data from such devices can be used as is.

Code availability
We provide the raw csv data files obtained during the data collection structured by user and device identifier. We did not implement any custom code to generate or process the data.

Fig. 15
How healthy is your diet? 36 -Answers to the item "Salt".